129 research outputs found
Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights
Significant success has been reported recently using deep neural networks for
classification. Such large networks can be computationally intensive, even
after training is over. Implementing these trained networks in hardware chips
with a limited precision of synaptic weights may improve their speed and energy
efficiency by several orders of magnitude, thus enabling their integration into
small and low-power electronic devices. With this motivation, we develop a
computationally efficient learning algorithm for multilayer neural networks
with binary weights, assuming all the hidden neurons have a fan-out of one.
This algorithm, derived within a Bayesian probabilistic online setting, is
shown to work well for both synthetic and real-world problems, performing
comparably to algorithms with real-valued weights, while retaining
computational tractability
Conductance-Based Neuron Models and the Slow Dynamics of Excitability
In recent experiments, synaptically isolated neurons from rat cortical culture, were stimulated with periodic extracellular fixed-amplitude current pulses for extended durations of days. The neuron’s response depended on its own history, as well as on the history of the input, and was classified into several modes. Interestingly, in one of the modes the neuron behaved intermittently, exhibiting irregular firing patterns changing in a complex and variable manner over the entire range of experimental timescales, from seconds to days. With the aim of developing a minimal biophysical explanation for these results, we propose a general scheme, that, given a few assumptions (mainly, a timescale separation in kinetics) closely describes the response of deterministic conductance-based neuron models under pulse stimulation, using a discrete time piecewise linear mapping, which is amenable to detailed mathematical analysis. Using this method we reproduce the basic modes exhibited by the neuron experimentally, as well as the mean response in each mode. Specifically, we derive precise closed-form input-output expressions for the transient timescale and firing rates, which are expressed in terms of experimentally measurable variables, and conform with the experimental results. However, the mathematical analysis shows that the resulting firing patterns in these deterministic models are always regular and repeatable (i.e., no chaos), in contrast to the irregular and variable behavior displayed by the neuron in certain regimes. This fact, and the sensitive near-threshold dynamics of the model, indicate that intrinsic ion channel noise has a significant impact on the neuronal response, and may help reproduce the experimentally observed variability, as we also demonstrate numerically. In a companion paper, we extend our analysis to stochastic conductance-based models, and show how these can be used to reproduce the details of the observed irregular and variable neuronal response
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems,
with homogeneous linear predictors on linearly separable datasets. We show the
predictor converges to the direction of the max-margin (hard margin SVM)
solution. The result also generalizes to other monotone decreasing loss
functions with an infimum at infinity, to multi-class problems, and to training
a weight layer in a deep network in a certain restricted setting. Furthermore,
we show this convergence is very slow, and only logarithmic in the convergence
of the loss itself. This can help explain the benefit of continuing to optimize
the logistic or cross-entropy loss even after the training error is zero and
the training loss is extremely small, and, as we show, even if the validation
loss increases. Our methodology can also aid in understanding implicit
regularization n more complex models and with other optimization methods.Comment: Final JMLR version, with improved discussions over v3. Main
improvements in journal version over conference version (v2 appeared in
ICLR): We proved the measure zero case for main theorem (with implications
for the rates), and the multi-class cas
- …